Epidemiology and clinical characteristics of interstitial lung disease in patients with rheumatoid arthritis from the JointMan database

Interstitial lung disease (ILD) is a progressive fibrotic disease associated with rheumatoid arthritis (RA); real-world data for evaluating RA–associated ILD (RA–ILD) are limited. We evaluated prevalence, time to onset, clinical characteristics and prognostic factors in patients diagnosed with RA (n = 8963) in the Discus Analytics JointMan database (2009–2019) with and without ILD. ILD prevalence was 4.1% (95% confidence interval 3.7–4.5); > 90% had an ILD diagnosis after RA diagnosis (mean time to onset 3.3 years). At baseline, a higher proportion of patients with RA–ILD were older (> 65 years), male, with history of chronic obstructive pulmonary disease (COPD) compared with patients in the RA cohort. Patients in the RA–ILD cohort were likely to have more severe RA characteristics and joint evaluation compared with patients without ILD, at baseline and before/after ILD diagnosis. In this large, real-world database patients with (vs without) ILD had a higher burden of RA characteristics. Previously established risk factors for RA–ILD were confirmed (age, baseline COPD, anti-cyclic citrullinated peptide positivity, C-reactive protein, Clinical Disease Activity Index score); thus, recognition of these factors and tracking routine disease activity metrics may help identify patients at higher risk of RA complications and lead to improved diagnosis and earlier treatment.


Methods
Data source. Patient demographics and disease characteristics were retrospectively analyzed following data extraction from the Discus Analytics JointMan database, a large US electronic medical records-based dataset initiated in March 2009. The JointMan database includes > 17,000 rheumatology patients covered by commercial, Medicare, or Medicaid insurance health plans. Practices across the following eight states are included: Washington, New York, Oregon, Florida, Georgia, California, Wisconsin, and Kentucky. Patient data were collected at rheumatology centers and were de-identified prior to analysis. In addition to electronic medical record data, the JointMan user interface collects clinical outcomes recorded by physicians at the time of the encounter.

Patient population.
Patients were included if they were aged ≥ 18 years at the initial visit with a rheumatologist participating in the JointMan network, had a provider-selected diagnosis of RA between January 1, 2009 and September 20, 2019, and had ≥ 1 visit after the initial visit date. Patients were excluded if their initial encounter occurred after RA diagnosis or if they experienced a drug-induced ILD diagnosis [International Classification of Disease, Tenth Revision, Clinical Modification (ICD-10-CM) codes J70. 2 and J70.4] at any time during the study period. Patients were assigned to either the RA cohort (patients with confirmed RA but no diagnosis of ILD during the study period) or the RA-ILD cohort (patients with a provider diagnosis of non-drug-induced ILD on or after the initial RA diagnosis date). RA index date was defined as the first RA diagnosis date recorded in the JointMan database (provided by the rheumatologist).
The overall study population was comprised of patients who were followed from the day after the RA index date to the last patient encounter date or the end of the study (September 20, 2019), whichever occurred first. A subanalysis was conducted in a set of patients grouped based on ILD diagnosis. For the subanalysis population, the ILD diagnosis index was defined as the first date of ILD diagnosis recorded in the JointMan database (for patients in the RA-ILD cohort), and patient characteristics were described for the 90-day periods before and after the ILD diagnosis index. For patients without ILD, the index date was based on distribution of the number of days from RA diagnosis to ILD diagnosis in the RA-ILD cohort; characteristics were described for the 90-day periods before and after the index date ( Supplementary Fig. S1).
Primary endpoints. The primary endpoints, assessed in the overall study population, were prevalence and time to onset of ILD. Prevalence was defined as the proportion of patients with RA and a diagnosis of ILD divided by the total number of patients with RA during the study period. Time to onset of ILD was defined as the time from initial RA diagnosis to first observed non-drug-induced ILD diagnosis. Exploratory endpoints. Exploratory endpoints, assessed in the exploratory analysis population, included baseline demographics, comorbidities, RA characteristics, and overall RA disease activity in the RA cohort compared with the RA-ILD cohort. RA characteristics included joint stiffness, erosions, extra-articular disease, anti-CCP antibodies, joint swelling, ESR, C-reactive protein (CRP), and Clinical Disease Activity Index (CDAI). CDAI remission score was defined as ≤ 2.8; CDAI low, moderate, and high disease activity scores were defined as > 2.8-10, > 10-22, and > 22, respectively 19 . Simplified Disease Activity Index (SDAI) remission score was defined as ≤ 3.3; SDAI low, moderate, and high disease activity scores were defined as > 3.3 to 11, > 11 to 26, and > 26, respectively 19 . Disease Activity Score in 28  www.nature.com/scientificreports/ activity scores were defined as > 3 to 6, > 6 to 12, and > 12, respectively 21 . Variables were assessed as potential predictors of RA-ILD.

Subanalysis endpoints.
For patients included in the subanalysis population, CDAI and RAPID3 scores, swollen and swollen28 joint counts, the number of rheumatologist encounters, and treatment utilization preand post-ILD diagnosis index were also assessed. The swollen and swollen28 joint counts are components of the DAS/DAS28 score: the swollen joint count is an assessment of 28 or more (up to 44) joints, while the swollen28 joint count is an assessment of only 28 pre-selected joints 22 .
Statistical analysis. The prevalence (95% confidence intervals [CIs]) of the first observed ILD diagnosis during follow-up was calculated. The time to ILD diagnosis was examined using unadjusted Kaplan-Meier survival curves. Descriptive statistics for continuous baseline variables were compared using Student's t-test and percentages for categorical and binary baseline variables were compared using the Chi-square test. Potential predictors of RA-ILD were analyzed by a Cox regression model. Patient demographic data and comorbidities were collected at baseline and were controlled for in the Cox model. RA characteristics were identified during and after the initial RA diagnosis and were controlled for as time-varying covariates in the Cox model. The final covariate lists were based on clinical rationale and model fitting; hazard ratios, 95% confidence intervals, and p values were provided for each covariate. Statistical significance for model inclusion was set at p < 0.05.
The number and percentage of patients with rheumatologist visits, treatment utilization, and each disease activity score in the pre-and post-index periods were calculated. P values for disease activity score category compared pre-and post-index periods and correspond to Fisher's exact test or Chi-square test with statistical significance set at p < 0.05. Ethical approval. This study was conducted in accordance with the International Society for Pharmacoepidemiology Guidelines for Good Pharmacoepidemiology Practices and applicable regulatory requirements 23 . The study protocol was reviewed by the internal BMS Observational Protocol Review Committee (OPRC). No identifiable protected health information was extracted or accessed from the database during the study, therefore the BMS OPRC confirmed that this analysis did not require ethical oversight. Additionally, the study did not involve the collection, use, or transmittal of individually identifiable data, and data were collected in the setting for the usual care of the patient. Informed consent from the study participants was not required because the dataset used in this observational study consisted of de-identified secondary data released for research purposes.

Results
Overall study population, persistence, and time to onset of ILD. In the overall study population, a total of 8963 patients with RA were identified during the period of January 1, 2009 to September 20, 2019. The prevalence (95% CI) of ILD in the overall population of patients with RA was 4.1% (3.7-4.5%).
Of the patients in the RA-ILD cohort, 91.8% (n = 337/367) had their first ILD diagnosis after their RA diagnosis. The mean time to onset of ILD after RA diagnosis was 3.3 years (median 2.3 years; Fig. 1).
Baseline patient demographics and disease characteristics. In the exploratory analysis population, there were a total of 5817 patients; 96.5% (n = 5612) had RA and no comorbid ILD diagnosis (RA cohort) and www.nature.com/scientificreports/ 3.5% (n = 205) had RA-ILD (RA-ILD cohort). Compared with the RA cohort, a significantly higher proportion of patients in the RA-ILD cohort were older, male, white, had Medicare as their primary insurance category, and had a history of chronic obstructive pulmonary disease (COPD) ( Table 1). The proportion of patients with a smoking status of 'yes' was similar between cohorts. Patients in the RA-ILD cohort also had more severe and more active RA at baseline than patients in the RA cohort. Most RA characteristics or manifestations were significantly more prevalent in the RA-ILD cohort (RF + , rheumatoid nodules, erosions, extra-articular disease, and anti-CCP positivity). In addition, baseline ESR level was significantly higher in the RA-ILD cohort (Table 1). Patients in the RA-ILD cohort versus the RA cohort had higher mean baseline scores for CDAI, SDAI, DAS28 (CRP), and DAS28 (ESR); RAPID3 scores were similar between cohorts (Table 2). A higher proportion of patients in the RA-ILD cohort were in the high disease activity category for SDAI, DAS28 (CRP), and DAS28 (ESR) than those in the RA cohort.
Risk factors for RA-ILD. Potential predictors of RA-ILD diagnosis were assessed in the exploratory analysis population (patients with 6 months of follow-up). Older age (≥ 65 years old) and a history of COPD at baseline were shown to be risk factors for developing ILD (Fig. 2). Several time-varying covariates (anti-CCP positivity, CRP > 5 mg/L, and a moderate-to-high CDAI score) were also shown to be predictive of developing ILD. No other covariates were significant based on evaluation of confidence intervals.
Subanalysis: comparison of outcomes for patients in the RA and RA-ILD cohorts before and after ILD diagnosis. In order to evaluate RA disease activity, rheumatologist encounters, and treatments in patients in the RA-ILD versus RA cohort, data from the 90-day periods before and after the earliest recorded ILD diagnosis date were compared. In total, there were 7150 patients with RA only and 240 patients with RA-ILD who had data in both the 90 days prior to and 90 days after the ILD diagnosis index.
For both patient cohorts, disease severity measure missingness was lower in the post-index period compared with the pre-index period (for example, the proportion of patients with a CDAI score in the RA-ILD cohort post-versus pre-index was 94.6% versus 13.3%, and in the RA cohort post-versus pre-index was 49.6% versus 24.7%; Table 3). In the post-index period, for disease severity, ≥ 90% of patients in the RA-ILD cohort had CDAI or RAPID-3 scores reported compared with ~ 50% for patients in the RA cohort. In the post-index period, the proportion of patients in each severity category were similar between patients in the RA-ILD and RA cohorts. Approximately 97% of patients in the RA-ILD cohort had a swollen or swollen28 score in the post-index period compared with 52% of patients in the RA cohort (Fig. 3). Patients in the RA-ILD cohort reported more swollen joints in the post-index period compared with those in the RA cohort (Fig. 3).
For both the pre-and post-index periods, a greater proportion of patients in the RA-ILD cohort had rheumatologist visits compared with patients in the RA cohort. Patients in the RA cohort had a similar number of rheumatologist visits in the pre-and post-ILD diagnosis index periods: 69.8% (n = 4990/7150) versus 68.2% (n = 4877/7150), respectively. However, for patients in the RA-ILD cohort, there was an increase in the number of rheumatologist visits in the post-ILD diagnosis index period; pre-versus post-ILD diagnosis index periods: 74.2% (n = 178/240) versus 99.6% (n = 239/240), respectively.
For both the pre-and post-index periods, a greater proportion of patients in the RA-ILD cohort used glucocorticosteroids/disease-modifying antirheumatic drugs (DMARDs) and biologics compared with patients in the RA cohort. For patients in the RA-ILD cohort, a similar proportion of patients in the post-ILD versus pre-ILD diagnosis index periods used glucocorticosteroids/DMARDs (82% vs. 83%) and biologics (48% vs. 45%). However, for patients in the RA cohort, a lower proportion of patients used glucocorticoids/DMARDs (58% vs. 74%) and biologics (31% vs. 35%) in the post-ILD diagnosis index period compared with the pre-ILD diagnosis index period.

Discussion
In this large, real-world study, using data from the United States-based Discus Analytics JointMan database, the prevalence of RA-ILD was 4.1% and the mean time to onset of ILD after RA diagnosis was 3.3 years. We identified several risk factors for RA-ILD: age (≥ 65 years), COPD at baseline, anti-CCP positivity, CRP > 5 mg/L, and a moderate-to-high CDAI score. Patients with RA-ILD have increased morbidity compared with patients with RA without ILD 3 , which is supported by our results showing that patients with RA-ILD had more active RA at baseline and after ILD diagnosis. Consequently, patients with RA-ILD may require more clinical consultation.
The prevalence of RA-ILD ascertained from our study (4.1%) falls towards the lower end of the range previously reported; however, those studies had differing methodology and ILD definitions [5][6][7][8][9] . A recent United States-based cohort study using Medicare claims data from > 500,000 patients between 2008 and 2017 estimated the baseline prevalence of RA-ILD to be 2.0% and overall prevalence (RA-ILD was present or developed during the analysis period) to be approximately 5.0%, which is in line with our results 24 . A study, similar to that reported here, using the United States-based Truven Health MarketScan Commercial and Medicare Supplemental health insurance databases, showed the prevalence of RA-ILD in the US was 3.2 to 6.0 cases per 100,000 people 4 . A retrospective review of patient data in Jordan found prevalence of RA-ILD among 210 patients to be 3.7% 25 . It is important to note that the study reporting an RA-ILD prevalence at the higher end of the range of 58% was a small analysis of 36 patients with early RA (duration < 2 years); the prevalence estimate included both patients with "clinically significant ILD" and with "abnormalities compatible with ILD but no clinically significant ILD" 9 . As previously noted, in our study, patients were only classified as having RA-ILD if a diagnosis of ILD was definitive.
In this study, assessment of the clinical characteristics of patients in the RA and RA-ILD cohorts showed that patients with ILD were more likely to be older, male, have a history of COPD, and have more prominent RA www.nature.com/scientificreports/ disease characteristics (a higher proportion of patients were RF+, anti-CCP+, with rheumatoid nodules, erosions, extra-articular disease, swelling, and higher baseline ESR). A higher proportion of patients with RA-ILD had Medicare insurance when compared with the RA cohort; this can be at least partially explained by the age difference, as a larger proportion of patients with RA-ILD were over the age of 65 when compared with the RA cohort. Potential risk factors for RA-ILD were further analyzed by a Cox regression model and, in addition to older age and seropositivity, which are already established risk factors [16][17][18][25][26][27][28][29] , we confirmed baseline COPD 30 , and baseline moderate-to-high CDAI score, and CRP > 5 mg/L as risk factors. Although smoking is an established risk factor for RA-ILD 25,31 , in our analysis, differences in baseline smoking prevalence were not significant based on statistical testing. However, it should be noted that identification of smoking exposures in patient data www.nature.com/scientificreports/ is limited by missingness, and there may have been a large proportion of false negatives, which would limit reliability. It should further be noted that although COPD and ILD have distinct, separate pathophysiologies, they share overlapping risk factors, and so may develop either simultaneously or successively 30,32 . Disease activity has previously been identified as a risk factor for RA-ILD, using DAS28 33 or CDAI 34 as the measure. A retrospective analysis of data from patients (n = 1419) with early/mild or severe interstitial lung abnormalities in the Brigham and Women's RA Sequential Study revealed that those with high or moderate disease activity (defined by DAS28) had an increased risk of developing RA-ILD (compared with patients in remission or with low disease activity) 33 . A smaller (n = 118) case-control study showed that a CDAI score > 28 was associated with the presence of RA-ILD 34 . Previous studies have also identified baseline CRP level as a risk factor for RA-ILD: CRP > 10 mg/L or "higher" baseline levels 35,36 . Our analysis refines these further by identifying baseline CRP > 5 mg/L to be predictive of RA-ILD. The identification of new risk factors for RA-ILD may help physicians diagnose and treat patients earlier in the course of the disease.
Our subanalysis of outcomes before versus after ILD diagnosis provides some insight into RA disease severity and healthcare utilization (treatments, encounters) for patients with RA who develop ILD. Based on swollen joint counts, patients with RA-ILD appeared to have worse RA symptoms after ILD diagnosis compared with patients who did not develop ILD. It should be noted that more patients in the RA cohort had missing disease severity data, which may be an artifact of scheduling routine assessments 1-2 times per year. Missing data may also be accounted for by patients with low disease activity or those in remission being less likely to consult their physician as frequently as patients with medium/high disease activity. Thus, more complete disease activity data may highlight a greater disparity in RA symptom control between patients with RA who develop ILD and those   and RA-ILD cohort n = 240. c In the RA cohort (patients without ILD), a stochastically determined modifier was imputed and added to the initial RA diagnosis based on the frequency distribution of days for patients in the RA-ILD cohort and characteristics were described for the 90-day periods before and after. ILD Interstitial lung disease, RA Rheumatoid arthritis, RA-ILD RA-associated ILD, SD Standard deviation. This was a large analysis of real-world data collected by rheumatologists across several regions of the United States. The comprehensiveness of the JointMan database, which incorporates rheumatology encounters, rheumatology-specific laboratory results, clinical evaluations, and prescriptions within the JointMan network for patients covered by commercial, Medicare, and Medicaid insurance plans, allows for longitudinal analysis of RA and related treatments and conditions. Other strengths are the integration of live patient electronic records allowing for continuous coverage, and being part of a rheumatology network which suggests the clinicians are knowledgeable on disease surveillance practice. Compared with randomized clinical trials, real-world studies are important to provide evidence that is generalizable to different populations and are useful for assessing specific characteristics of patient populations, risk factors on a pre-defined outcome, and comparative effectiveness 37 .
Despite the above strengths, there are naturally some limitations to the analysis. Coding errors may have occurred in the patient data, and in some instances, diagnostic codes may have been entered as rule-out criteria and not actual disease. Due to the nature of the study design, the symptoms and tests used to reach diagnosis were not captured in this study. Specific validation studies assessing the codes for RA are lacking, however the validity of ICD-9-CM and ICD-10-CM versus chart review data have been shown to be comparable for rheumatic disease 38 . Additionally, encounters outside of the JointMan network such as inpatient visits, emergency department visits, and visits with non-rheumatology physicians are not captured. The use of the JointMan database also varied between sites and over time. Although data were collected across many regions of the United States, the JointMan database population was limited to eight states, with most of the population located in Washington. As mentioned, our dataset also had different levels of missing data for swollen joint counts and disease severity scores for patients in the RA and RA-ILD cohorts. Missing data may have been driven by lower disease activity, especially for patients in the RA cohort. Furthermore, as this study covers patients from 2009 to 2019, clinical assessment of disease activity scores may have become more common since the beginning of the study period, which may contribute to missing data.
In conclusion, this work further describes the disease and natural history of patients with the debilitating conditions of RA and ILD. The prevalence of RA-ILD in this large, real-world study using data from the United States-based JointMan database was 4.1%. This study provides insight into the increased burden of disease among patients with RA-ILD versus RA without ILD; RA disease activity may be worse after ILD diagnosis compared with the pre-ILD diagnosis index period and compared with patients with RA alone. Several previously established risk factors for developing ILD were confirmed, including older age, COPD at baseline, anti-CCP positivity, CRP > 5 mg/L, and a moderate-to-high CDAI score. Recording and tracking routine clinical disease activity metrics may help identify patients at higher risk of RA complications. Recognition of the risk factors underscored here may lead to early diagnosis of RA-ILD and quicker treatment initiation, leading to better clinical outcomes for these patients.

Data availability
The data that support the findings of this study are available from Bristol Myers Squibb but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Bristol Myers Squibb. Data requests are sent through an independent review committee to review who provide the final decision on requests. Bristol Myers Squibb policy on data sharing may be found at https:// www. bms. com/ resea rchers-and-partn ers/ indep endent-resea rch/ data-shari ng-reque st-proce ss. html.